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INTRODUCTION 



The concept of a change score has considerable intuitive appeal. 

A person subtracts last week's weight from today's weight and talks of 
having gained or lost five pounds. Yet, change scores have more than 
their share of conceptual problems. Weights are comparable — a two- 
hundi ed-pounder outweighs a one-hundred-pounder regardless of his other 
traits; but changes are not necessarily comparable a loss of twenty- 
five pounds may be a godsend for one individual but a disaster for 
another. I yen in cases where changes in one direction are preferred, 
certain comparisons of changes appear inappropriate. For example, an 
instructor may grade physical education students on their Improvement 
in running the mile. All of the students running an eight-minute mile 
at the beginning of the course may cut more than a minute out of their 
times; none of the four-minute mi levs aie likely to improve by more 
than a few seconds. Clearly, the eight -minute milers "improved" their 
time by more seconds than did the four-minute milers. Yet no instructor 
Would give A's to the slowest runners and F's to the fastest, regardless 
of his commitment to the concept of grading on improvement. Somehow, 
these "improvements" are not comparable for the purposes of evaluation. 
This inability to compare changes directly at different points of the 
scale, even with rati o scales , is the fundamental problem of the 
measurement of change, 

The comparability problem is related to the fact that change scores 
are generally correlated with initial status. When change and initial 
status are negatively correlated, low-scores have an advantage in the 
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sense the/ are likely to gain more. Similarly, in rarer instances 
when change and initial status are positively correlated, the initially 
high* scoring individuals have the advantage. The comparability problem 
can be alleviated by using either change quotients or residual change 
scores, both of which are independent of initial status. Change quotients 
and residuals are perfectly correlated with each other under certain cir- 
cumstances. Residuals are to be preferred when the data meet certain 
assumptions which will he outlined in an ensuing section. 

Methods for estimating the true change and true-sco/e residual when 
the data are unreliable will be presented and the residual procedure will 
be extended to the comparison of groups, such as school systims. The 
reliability of change scores and residuals are discussed ar.d procedures 
are suggested for constructing confidence intervals for residuals. 

Change scores have also been used in statistical analyses of the 
determinants of change. A brief icview of this use of change scores is 
provided which suggests that change scores are unnecessary and often even 
inappropriate for statistical studies. Alternative statistical procedures 
are suggested. 

The Not at iou.al System 

In general, capital letters refer to true scores or errorless 
scores, and small letters refer to the corresponding fallible scores. 

All scores are expressed in terms of deviation scores, i.e., their 
grand mean has been subtracted from them. This simplifies the 
computation because the mean of all deviation scores is *erc. It does 
not affect the generality of any formula or proof since deviation 
scores can he converted hack to the original scores whenever necessary. 
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X and x represent initial status, 

Y and y represent final status, and 
W and w represent a variable other than X or Y, 

2 2 

represent the variance of X; s^ represents the variance of x. 

represents the correlation between X and Y; r between x and y. 

Since the covariance of two true scores equals th p expectation of 
the covariance between their corresponding fallible scores, both 
covariances will be represented by a capital C, such as C 

xy 

Regression weights will be represented by A and B for true scores 
and a and b for fallible scores. Subscripts will be used unless the 
context indicates which regression weight is desired. By^ ^ is the 
weight given X when both X and W are used to predict Y. 

Other symbols will be defined as they appear. 

THE RELATIONSHIP BETWEEN CHANGE AND OTHER VARIABLES 

An early and continuous interest in psychology has been the rela- 
tionship between change and other variables -- how can change be 
predicted? Thorndike (1924) cites six early studies cn the relation 
of initial ability to gain. Other researchers like Woodrow (1946) 
correlated the "ability to learn" with other variables such as intel- 
ligence test scores. An examination of the weaknesses of the common 
statistical approaches suggest that change scores are unnecessary and 
often even inappropriate for statistical studies. Alternative statis- 
tical procedures are suggested. 

The Correlation Between Change and Initial Status 

Most correlations are reduced ! ut ,.ot biased by errors in the data. 
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A positive or negative correlation retains its sign but is smaller in 
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absolute value. Thorndike (1924) demonstrated that, the correla- 
tion between change and initial status is biased in a negative direc- 
tion by errors in the pretest because the pretest error is also present 
in the change score but with the opposite sign, 
x - X + e 

x 

e-y-x=G+e - e 
* J y x 

,;here G = Y - X and g = y - x. Consequently, the covariance of the raw 
gain and raw initial status is not equal to the covariance of the 
corresponding true scores, as is generally the case: 

CCg.xJ * C(G + e - e x , X + e x ) 

= C(G, X) - s e 2 

X 

Thomson (1924, 1925) and Zicve (1940) suggested analytic procedures 

2 

which, in effect, added s^ bad: to the raw score covariance before 



computing the correlation coefficient (Bereiter 1963, pp . 6-7). 

Thorndike (1966) used parallel pretests to eliminate this bias. 

One pretest was used to compute the gain and the other was correlated 
with the gain. The average initial-gain correlation increased from 
-.20 to +.10. This concern with the initial-gain correlation appears 
to be a pseudo-problem, even for true scores As Thorndike points out, 
correlation is positive only when the post -test variance is sufficently 
larger than the pre-test variance. 

C(X,G) > 0 if and only if t (X f Y-X) > 0 

,2 



f . 

XY ‘ X 



n 



r xy s y - > 0 



XY 



Hence, the initial-gain correlation does net appear o add anything to 
our knowledge. In fact, if Thorndike's analysis is extended further, 
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the initial -pain correlation issued can be parsimonious ly subsumed 

S 

under the heading ’'Predicting Y from X . ,r If Ry^ (or __Y) is greater 
than one, equal to one, or less than one, the initial gain correlation 
will be correspondingly positive, zero, or negative (Garside, 1956). 

Thorndike used mental age scores instead of I.O.'s in his study. 

He points out that if he had used l.Q. scores with a standard variance at 
each age, the correlation between l.Q. at age 8 and the gain in l.Q. 
between age 9 and age 12 could be positive only if the age-8 test 
correlated more highly with the age-12 test than with the age-9 test-- 
”a fairly Improbable and unnatural event'* (p. 126). He might have 
added that the correlation between l.Q. at age 8 and the l.Q. gain 
from age 8 to age 12 could not possibly be positive as long as the 
age-8 and the age- 12 variances were equal since cannot exceed one 
unless Sy is larger than S^, . 

The difference between a positive and a negative ini tia! -gain 
correlation se?ms more interesting than the difference between a Ryy 
of 1.05 and a of .95; yet both tne initial-gain correlation arid 



are determ* ned by the same data, Sy, S,. , and R^y. The distinction 
between a positive and a negative initial -gain correlation appears to 
be artificial and misleading. 

C hange and Oth er Variables 

Farly stvdics correlated raw change with other variables and 
generally obtained near zero results (Woodrow, 1964). Lord (1963, p. 
35) showed that such correlations may be quite misleading. If R^, 
equals zero, then ^ will usually be positive In other words, for 
everv suhproun with the same initial status, *■' will be correlated 
positively with change. The question is whether the R^, for the total 
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group or che R ( .^ for each subgroup with the same initial sta:us is more 

meaningful. Lord concludes, ,r In general, the more extraneous variables 

I 

one can hold constant in a scientific study, the clearei t H picture. 

i 

For this reason, it is not the total group correlation R^ but rather 
the partial correlation R^ ^(= Ry^ that is usually of greater 
interest' 1 . (1963, p. 35). In other words, initial status s held 
mathematically constant sc that the correlation between m'.tiai status 

and change does not influence our estimate of the relat ioi ( 'ship between 

I 

change and a third variable, W. t 

r 

When X is held constant, G is entirely dependent on !;he value of V* 
Hence, R^ ^ is mathemat ical ly equivalent to Ry^ but the in^erpreta- 

t 

tion of the two is slightly different. R ^ ^ is the correlation between 



change and W with X held constant while Ry^ ^ is the cor ‘elation between 

Y and W with X held constant. The latter expression reqiires neither 

the computation nor even the concept of change scores. Similarly, Werts 

and Linn (197 ri , pp . 18-19) show that ^ equals By, F Just as tlie 

relationship between change and initial status can be mbn simply expressed 

in terms of B vv , so the relationship between change and; another variable W 

I 

i 

can be more simply expressed in terms of R ^ or the equivalent partial 

There is no need to compute change scores for coireia- 



correlation, R 



YW. X * 

l 

tional analysis. ) 

Correcting Partial Correlation and M u ltiple Regression Coefficients for 

Unrel iabi 1 ity 

Unrel i ab i 1 it y in the data can reverse the sign of a partial correlation 

.i 

or multiple regression coefficient as well as nffectin* its sign. Consequent- 
ly, zero-order correlations should be corrected for attenuation before eiUei . 

r 

in, - ’ them in partial correlation or multiple regressior fomulas. 
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Basefr^e Measures of Change 

Thorndike, Bregman, Eriton, and Wcodyard (1928) used a crude kind 
of partial correlation in their studies of Adult Lear ning, but logic 
behind the use of partials to study change was not clearly stated until 
DuBois (19S7), Manning and DuBois (1958, 1962) and Lord (1958, 1963). 
Technically, Manning and DuBois used a kind of part correlation. They 
partialed initial status out of final status and then correlated the 
residuals with other residuals and variables. Their study '1962) showed 
that residual gains in learning studies were (a) more highly correlated 
with predictors such as aptitude teits than were raw gains, (b) more 
highly intercorrelat ed , and (c) could be accounted for by a sing'e factor, 
which may be a general factor of psychomotor learning.. In contrast, Wood- 
row (1946) It ad concluded from a review of studies using raw gains that 
intelligence was not related to the ability to learn and that there was 
no evidence for a general factor for learning ability. The difference 
between these sets of studies is that Manning and DuBois controlled for 
initia} status through the use of part cc* re iat tons . They concluded 
that, (a) ’The correlations of residual gain are more consistent and more 
in line with what might be logically expected than are the correlations 
of crude gains...", (b) "Residual measures of learning seem to have more 
in common th^n do measures of crude gain in the s.jie functions...", am 1 
(c) "The frequently low correlation between change in learned proficiency 
and aptitude measures should be re-interpreted in 1 i p h t of logical and 
empirical inadequacies of the crude difference criterion of change," 

(1962, pp. 318- 19). 

Manning and DuBois' residual approach does not take errors of 

measurement into consideration. Consequent ly, when y, and w are 

unreliable, the residua] approach gixes us r , . when R.., v is 

Q wiy.xj piti.Aj 
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required. These coefficients, however, may even have opposite signs. 
Tucker, Pamarin, and Messick (1966) attempt to correct this problem 
through the use of their "base-free measure of change”. They partial 
true X rather than raw x out of y and correlate this ’’adjusted" resi- 
dual with other variables. This procedure produces a pnc correlation 
which, at least, will always have the same sign as the corresponding true 
score part correlation hut, nevertheless, would be a systemat ical ly biased 



estimate of R^y ^ . For example, assume that Y and lv are measured with 

perfect reliability but x is not. The estimate of Y would be; 

D r - r r /r 

x j “ vw wx yx xx 



2 , 

1 4 r ' r 

xy xx 



The correlation between the base-fre? measures of change and W would be: 
r(w,y-By^x) - r w/ ~ r wx r yx r xx 



/ , 2.2 ~ 2 , 

1 + r /r -2r / r 

xy xx xy xx 



which has the same sign as R^.^ ^ hut a slightly different denominate:. 
The point is not that the Tucker, et . al., approach is wrong; the above 
correlation could be adjusted to estimate R, /v Y . or an}' other true 
score pare or partial correlation that was required. However, it is 
far simpler mathematically tn ioirett the appropriate partial corre- 
lation or multiple regression coefficient for attenuation without com- 
puting or conceptualizing in terms of change scores, residuals, or base- 
free measures of change. 
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CMANGE SCORES FOR COMPARING INDIVIDUALS 

The introduction suggested that correlation between change and 
initial status made it inappropriate to use change score to evaluate 
individuals with different initial scores. An analogous problem 
occurred in the development of intelligence test scares. The first 
intelligence tests were scored in terms of "mental ages". A higher 
Mental Age (M.A.) meant the ability to answer more items correctly, 
but Mental Ages were not comparable in other ways for children with 
different chronological ages. For example, a Mental Age of ^even is 
above average for a five-year-old, but below average for a nine-year- 
old. To make comparisons between children of various ages more mean- 
ingful, an Intelligence Quotient or I.Q, was defined as one hundred 

times the ratio of mental Age to Chronological Age (C.A.). 

M A 

I.Q. = 100 

This "ratio" I.Q. was still not completely comparable since it did not 
have the same standard deviation for all chronological ages. Hence, 
an I.Q, of 120 might mean the 9Sth percentile at one age and the 90th 
percentile at another age. More recent Intelligence Tests have used 
derivation I.Q.'s which have the same standard deviation for all ages 
fuehrers & Lehman, 1969, p. 78 ) . 

It is important to consider very carefully what a deviation I.Q. 
score means and what it does not mean. Suppose that the deviation 
. >’s have a standard deviation of 16 for all age groups. Then an 
I.Q. of 116 means that the person scored at the 8dth percentile of the 
norm group for his age. If John has an I.Q. of 116 and Bill has an 
I.Q, of 84, John is abeve average for his age group and Bill is below 
average for his age greup. However, one does not know whether John 
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or Bill had a higher raw score unless ho knows their chronological 
ages as well as their I.Q.'s. The I.Q.’s are simply a comparison of 
individuals while mathematically holding their age constant. Their 
ages are not ’’empirical ly" held constant because John's vocabulary at 
the age of five is not compared with Bill’s vocabulary at. the same age. 

Similarly, Change Quotients (C.Q.) could be computed by holding 
initial status constant rather than age. Take, tor example, the 
Physical Education instructor discussed in the introduction. He could 
grade his students on their improvement in running the mile by sepa- 
rating the students luto groups according to their initial time and 
assigning C.Q.’s on the basis of the student's position within his 
own group, Runners who finished at 84th precentile of their group 
would be assigned at C.Q. of 116. 

In this approach, the runner's C.Q. is derived by comparing him 
to other runners with the same initial time. Unless this group is 
very large, sampling error may seriously affect his C.Q. The samp- 
ling error becomes progressively more serious as the size of the 
comparison group decreases. Som* grouping is possible, e.g., 8 minutes 
t 15 seconds, but any attempt to group individuals with very different 
initial scores may defeat the purpose of computing change quotients. 

An approach is required whi^h would decrease the sampling error by 
permitting the use of all of the data in assigning a C.Q. to a given 
individual. If the data meet the bivariate normal assumptions, then 
the Manning and DuBois' residuals (1562) provide such an approach by 
simply partif.ling initial status, X, out ol fina* status, Y, and using 
the residual, Q. 
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Q - V - » n * 

If X and Y have a bivariate normal distribution, the Q r s will be a 
normally distributed random variable with a zero mean and a constant 
variance at all levels of X, and will be independent of X* 

Pretest and posttest times for running the mile will not meet 
the normal bivariate assumptions because the variance of Y (or Q) is 
not likely to be equal at all the levels of X. An appropriate non- 
linear transformation of the data is required, Fortunate!/, running 
speed is one such transformation (e.g., if John runs the mile in 6 
minutes, his average speed is 10 miles per hour), Pretest and post- 
test speeds can plausibly be assumed to approximate a bivariate normal 
distribution. Consequently, speed will be used rather than time for 
computing the residuals. 

Assume that there is an infinite population of individuals, and 
that their initial status, X, and their final status, Y, have a perfect 
bivariate normal distribution. It is easy to show that Change Quo- 
tients and residuals competed for this population would be perfectly 
correlated . 

The Change Quotient for persons with an initial status X^ would 



equal 



CQ. = Y l ~ Y k 16 ♦ 100 



where Y. equals an individual’s final status, Y^ equals the average 
final status of all persons starting with X^, and T equals the stan- 
dard deviation for V given X, which is constant for all levels of X. 



The residual, Q^, would be equal 

Q. = Y. - B VV X. or Y 
M i !X k l 



Y,, 
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where Y equals B^X^, the predicted Y for all individuals with initial 
status . 
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But for an infinite population with a perfect bivariate normal 
distribution, equals Y^. Hence, 



and residuals and Change Quotients are perfectly correlated* 

The Relative Efficiency of Change Quotient s and Residuals 

To compare the relative efficiency of Change Quotients and 
residuals, consider a sample of 100 persons from the infinite popula- 
tion described above, The Change Quotients and residuals are not 
necessarily perfectly correlated nor are they necessarily equal to 
Change Quotients and residuals computed on the basis of the entire 
population , 

To simplify the analysis, a simple linear transformation of the 
Change Quotients will be used: 



Now the Change Quotients and residuals computed on the basis of the 
entire population are identical, and can jointly be designated as 
CQ/Q(pcp)., which represents the value of CQ/Q^ derived from the 
infinite population. It is not a population value since it represents 
only individual i, 

A kind of standard error of measurement can be derived for either 
the sampl e-deri ved Change Quotients or the sample-derived residuals 
which will represent the extent to which sample-derived values differ 
from the popul at ion-deri' „ 1 CQ/Q(pop), The sampl e-der ived residuals 
have a smaller standard error of measurement than do the sample- 
derived Charge Quotients. The error of measurement for the Change 



Q = Y. - Y. 
x i l k 
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Quotients equals: 

Error (CQ) = CQ. - CQ/QCpop). 

= Y. - Y^ (sample) - Y. + Y k (pop) 

= Y^Cpop) - Y^Csample) 

Hence, the standard error of the Change Quotients equals the standard 
error of Y^, or 

S.E.(CQ) = S.E.(Y k ) = t/ J 

n k 

where T equals the standard deviation of Y given X and n^ equals the 
number of persons in the sample with the same initial 

Similarly, the standard error of the residual equals the stan- 
dard error of Y. . or. from Draper and Smith (1966, p. 22), 

: r V 2 "\ k ’ 

S.E.(Y.) = T I- + Hk_.ll, I S' = total sample size 

k (n Cx-xT 2 ! 

Since a finite sample of one is being used rather than an 
infinite population, some grouping will be necessary to compute the 
Change Quotients. Assume interval size of one-half of a standard 
deviation, the comparison group centered around the mean of x (Z^ = 0) 
would then be expected to contain approximately twenty persons, and 
the comparison group centered around an x value of 2 standard devia- 
tions away from the mean (Z^ = *2) would be expected to contain 
approximately three persons. With this grouping and a sample size 
of one hundred, the standard error of tie residuals would be less 
than one-half the standard error of the Change Quotients. For 

example, ,;hen Z y = 0, the standard errors are ,10T and .221 for the 
k 

residuals and Change Quotients, where T, once again, is the standard 

deviation of Y given X. For Z = i 2, the corresponding values are 

x k 

.22T and .587. Since the grouping procedure outlined above introduces 
a small bias in the estimate of Change Quotients, the standard 
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